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Abstract 

Background: In the analysis of effects by cell treatment such as drug dosing, identifying changes on gene 
network structures between normal and treated cells is a key task. A possible way for identifying the changes is to 
compare structures of networks estimated from data on normal and treated cells separately. However, this 
approach usually fails to estimate accurate gene networks due to the limited length of time series data and 
measurement noise. Thus, approaches that identify changes on regulations by using time series data on both 
conditions in an efficient manner are demanded. 

Methods: We propose a new statistical approach that is based on the state space representation of the vector 
autoregressive model and estimates gene networks on two different conditions in order to identify changes on 
regulations between the conditions. In the mathematical model of our approach, hidden binary variables are newly 
introduced to indicate the presence of regulations on each condition. The use of the hidden binary variables 
enables an efficient data usage; data on both conditions are used for commonly existing regulations, while for 
condition specific regulations corresponding data are only applied. Also, the similarity of networks on two 
conditions is automatically considered from the design of the potential function for the hidden binary variables. 
For the estimation of the hidden binary variables, we derive a new variational annealing method that searches the 
configuration of the binary variables maximizing the marginal likelihood. 

Results: For the performance evaluation, we use time series data from two topological^ similar synthetic networks, 
and confirm that our proposed approach estimates commonly existing regulations as well as changes on 
regulations with higher coverage and precision than other existing approaches in almost all the experimental 
settings. For a real data application, our proposed approach is applied to time series data from normal Human 
lung cells and Human lung cells treated by stimulating EGF-receptors and dosing an anticancer drug termed 
Gefitinib. In the treated lung cells, a cancer cell condition is simulated by the stimulation of EGF-receptors, but the 
effect would be counteracted due to the selective inhibition of EGF-receptors by Gefitinib. However, gene 
expression profiles are actually different between the conditions, and the genes related to the identified changes 
are considered as possible off-targets of Gefitinib. 

Conclusions: From the synthetically generated time series data, our proposed approach can identify changes on 
regulations more accurately than existing methods. By applying the proposed approach to the time series data on 
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normal and treated Human lung cells, candidates of off-target genes of Gefitinib are found. According to the 
published clinical information, one of the genes can be related to a factor of interstitial pneumonia, which is 
known as a side effect of Gefitinib. 



Background 

Gene network estimation from time series gene expres- 
sion data is a key task for elucidating cellular systems. 
Thus far, wide variety of approaches have been proposed 
based on the vector autoregressive (VAR) model [1,3], 
the state space model [4-6], and the dynamic Bayesian 
network [7,8]. Recently, time series gene expression data 
on multiple conditions aiming at analyzing effects of cell 
treatment such as drug dosing and heat shock are avail- 
able. We here assume that some gene regulations are 
disrupted but many of the gene regulations do not 
change due to some treatment of interest, and try to 
find a small number of changes on regulations as keys 
for elucidating effects by the treatment. 

A possible way for finding changes on regulations is to 
estimate networks from two data sets separately and 
then compare their structures. However, due to the lim- 
ited length of time series data (usually less than 10 time 
points) and unignorable measurement noise, networks 
are estimated with high error rates and the estimation 
errors cause the serious failure on identifying changes 
on regulations. Thus, approaches using two time series 
data in an efficient manner are strongly demanded. 
Also, widely used statistical methods such as the VAR 
model and dynamic Bayesian network assume equally 
spaced time points in time series data. However, 
observed time points on usually available time series 



Table 1 Comparison of the variation annealing 
(Proposed) and EM algorithm (EM) based on the 
proposed model 



(a) 


# of time points 




50 






25 






# TP 


# FP 


PRE 


# TP 


# FP 


PRE 


Proposed 


295.9 


41.7 


0.88 


238.4 


71.6 


0.77 


EM 


294.9 


119.2 


0.71 


196.8 


66.9 


0.75 


(b) 


# of time points 




50 






25 






# TP 


# FP 


PRE 


# TP 


# FP 


PRE 


Proposed 


39.8 


13.2 


0.75 


23.4 


20.8 


0.53 


EM 


39.9 


39.4 


0.5 


11.5 


10.0 


0.53 



(a) The number of true positives (# TP) and false positives (# FP) of estimated 
regulations in two network models by the proposed approach and EM for 
equally spaced time series data. PRE denotes the precision of the results. 
Regulations in two networks are 305 in total, (b) The number of true positives (# 
TP) and false positives (# FP) of changes on regulations between two network 
models estimated by the proposed approach and EM for equally spaced time 
series data. The regulations changed in two networks are in total 47. 



data are not equally spaced [5,6,9], and approaches that 
can handle unequally spaced time series data in a theo- 
retically correct way should be considered. 

We propose a new statistical model that estimates 
gene networks on two different conditions in order to 
identify changes on regulations between the conditions. 
As the basis of the proposed model, we employ the 
state space representation for VAR model (VAR-SSM), 
in which observation noise is considered between the 
measured or observed gene expressions and the true 
gene expressions in observation model and gene regula- 
tions between true gene expressions are considered in 
the system model [10]. The VAR-SSM can handle 
unequally spaced time series data by ignoring observa- 
tion model on the non-observed time points. For con- 
sidering the changes on regulations, we introduce 
hidden variables to the VAR-SSM in order to indicate 
the presence of regulations in each condition. If hidden 
binary variables on two conditions indicating the pre- 
sence of a regulation are both estimated as one, the reg- 
ulation is considered as a commonly existing regulation. 
On the other hand, if only one of the hidden binary 
variables for the regulation is estimated as one, the reg- 
ulation is considered as a condition specific regulation. 
We also introduce a potential function between the hid- 
den binary variables that is designed to take high prob- 
ability if the hidden binary variables on two conditions 
take the same value. From the design of the potential 
function, the similarity of networks on two conditions is 
automatically considered. Since the time series data on 
both conditions are used for estimating commonly 
existing regulation due to the use of the hidden binary 
variables, an efficient data assignment is achieved. In 
addition, from the more accurate estimation of com- 
monly existing regulations by the efficient data assign- 
ment, accurate identification of changes on regulations 
is induced. 

The hidden binary variables are estimated by search- 
ing the configuration of binary variables that maximizes 
the marginal likelihood of the model. However, search- 
ing the optimal configuration is computationally intract- 
able. Thus, as an alternative approach, we derive a new 
variational annealing method based on [11] in order to 
estimate the hidden binary variables. We also give a 
proof for the effectiveness of the variational annealing 
compared to other candidate alternatives, the variational 
annealing and the EM algorithm, in order to show the 
validity of using the variational annealing. 
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For the performance evaluation, we generate two reg- 
ulatory networks in such a way that most of the regula- 
tions commonly exist and some exist only on one of the 
networks. We then apply our proposed approach and 
existing var model based and dynamic Bayesian network 
based approaches to two equally spaced time series data 
drawn separately from the generated networks. From 
the comparisons of true positive rates and false positive 
rates of these approaches, we confirm the effectiveness 
of our approach. We also generate unequally spaced 
time series data from these networks, and show that our 
approach works correctly on unequally spaced time ser- 
ies data while the performance of the existing 
approaches assuming equally spaced time points is dras- 
tically worsened. 

Our proposed approach is used to analyze changes on 
regulations in gene networks between normal Human 
lung cells and Human lung cells treated by stimulating 
EGF-receptors and dosing an anticancer drug termed 
Gefitinib. A lung cancer condition is simulated by the 
stimulation of EGF-receptors in the treated cells. Since 
Gefitinib is known as a selective inhibitor of EGF-recep- 
tors, the stimulation of EGF-receptors would be coun- 
teracted by Gefitinib, and hence the treated cells are 
expected to be the same condition as normal cells. 
However, gene expression profiles from normal and 
treated cells are actually different, and off-targets of 
Gefitinib causing unexpected positive or negative effects 
are implied. We focus on genes with changes on regula- 
tions between the networks estimated by our approach 
and find possible off-target genes of Gefitinib. According 
to the published clinical information, one of the possible 
off-target genes is suggested as one of factors of intersti- 
tial pneumonia, which is known as a side effect of 
Gefitinib. 

Methods 

Vector autoregressive model and its state space 

representation 

Vector autoregressive model 

Given gene expression profile vectors of p genes during 
T time points {y lt y T }, the first order vector autore- 
gressive (VAR(l)) model at time point t is given by 

Yt =A Yt-i +e i> 

where A is a p x p autoregressive coefficient matrix, 
and e t is observation noise at time t and follows 
Af(0, diagtof, cfp]\ a normal distribution with mean 
0 and variance diag[crf , The (/, ;)th element of A, 
A ip indicates a temporal regulation from the yth gene to 
the /th gene, and if A t j * 0, regulation from the yth gene 
to the ith gene is considered. By examining whether A t j 
is zero or not for all i and a gene network is 



constructed. Since equally space time points are 

assumed in the VAR model, it has difficulty on handling 

unequally spaced time series data. 

State space representation of VAR model (VAR-SSM) 

Let T be the set of equally spaced entire T time points 

and Tobs the set of time points where gene expressions 

are observed. Note that T 0 bs Q T holds. VAR-SSM is 

comprised of two models: system model and observation 

model. Let x t be hidden variable vector representing 

true gene expression at time t. The system model is 

given as the VAR model of x t : 

x t = Ax t -i + rj t , t e T, 

where T[ t is the system noise normally distributed with 
mean 0 and variance H - dmg[hi,...,h p ]. The observation 
model represents measurement error of observed gene 
expression y t and true gene expression x t at observed 
time point t e T 0 bs : 

y t = xt + p v t e Tobs, 

where p t is the observation noise normally distributed 
with mean 0 and variance R = diaglr^..., r p ]. Unequally 
spaced time series data are handled by ignoring observa- 
tion model at non-observed time points. 

Joint model of VAR-SSM for two time series data 

(c) 

Let {y t } te7 -w be time series gene expression data on cell 
condition °c\ where is the set of observed time 

' obs 

points on cell condition c. We also let 7~( c ) be the set of 
time points from 1 to T^, where = max{t e T^}}- 
Given time series data on two types of cell conditions c 
= 1 and 2, we propose a new VAR-SSM model to esti- 
mate gene networks in the two conditions as well as 
identify changes on regulations between them. The 
model is comprised of the following two equations: 

where ° denotes the Hadamard product, £ (c) is a p x p 
binary matrix, and ^ and pf> are respectively system 
and observation noises from J\f(0,H) and Af(0,R). In 
this model, the (/, ;)th element takes one if regula- 
tion from gene ; to gene i exists on condition c and 
zero otherwise, i.e., the presence of regulations is con- 
trolled by £ (c) and the AR coefficient matrix A is com- 
monly used in conditions 1 and 2. Changes on 
regulations are identified when regulations exist only in 
a condition. 

The complete likelihood of our model, P(Y, X, 0, £), 
where Y, X, 0, and E are respectively the sets of y[ c \x[ c \ 
parameters, and £ (c) , is given by the following equation: 
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P(Y,X,®,E) = f\ FT l ^^eK P \--{x\ c) -AoE^ c }jH-\x^ 

c= i terM V2tt I 2 J 



>< n 



I . - P exp --(y! CJ - xfyR- 1 {yf - xf) P{&, E), 



where the prior distribution P(0, E) is assumed to be 
factorized as 

P(0,E) = flPPuJPfrOnW^'^'^)- 

i j 

Here, is a parameter for a potential function of E^p 

1 

given by 



Figure 1 shows a graphical representation of the pro- 
posed model, where dependency of the parameters and 
variables are indicated. The hyperparameters are 
omitted and the observed data y t is represented by gray 
nodes. In the observed data and y^p are propagated 



mainly via hidden variables E\- L) and E?l Due to the 
data propagation, more accurate estimation is expected 
in the proposed model than the approaches considering 
data on two conditions independently. 

For the parameter estimation, we search the config- 
uration of E maximizing the following marginal likeli- 
hood: 



and E\. ] defined later. The prior distributions of Ay is 
oy 



P{Ay\h if E\p, Ef) = N{A ifl 0, h • cupAfiAif, 0, hi - ao) 1 - 

where a 0 and a x are parameters controlling the 
shrinkage of coefficients A and Fy is a binary variable 
that takes 1 if E^p or E^p takes 1 and 0 otherwise, i.e., 

Fy is given by 1 — Ylc=i (1 ~~ ^ij)- a i ls set to a l ar g e 
value, while a 0 is set to smaller than a x . From the 
design of the prior for A ( p if E^p or E\p takes one, i.e., 
there exists regulation from gene ; to i in condition 1 or 
2, weaker prior N{Aif, 0, h x ;• a\) is selected and the 
shrinkage of the coefficients are avoided. Otherwise 
stronger prior Af{Ay; 0, hi • ao) is selected and the spar- 
sity of the network structures is promoted. The prior 
distributions of h t , and r t are given by 

P{hi)=ig{hi',u 0 ,k 0 ), 

p{n)=ig{n;v 0f i 0 ) f 

where XQ represents the density function of inverse 
gamma distribution, u 0 and v 0 are the shape parameters, 
and I< 0 and / 0 are the inverse scaling parameters. Under 
the assumption that a small number of regulations 
change between two conditions, we design the prior dis- 
tribution for E\p and Ef\p{Ef\Ef\ Zij ) by using the fol- 
lowing potential function between E^p for c = 1 or 2: 



l-Zi 



if £?> 4^ 

otherwise 



In this setting, if z^ is small, E\- ] and E\- ] tend to take 



f and E f 

the same value and thus most or the regulations exist in 
both two conditions. We also introduce a prior distribu- 
tion for Zy by beta distribution with parameters Cto an d 

Cm 



1 



where is the beta function. Thus, the prior distri- 
bution of E\- } is given by 



argmax 

E 



j dX j d@P{Y,X,@,E). 



(1) 



Finding the optimal configuration of E is computa- 
tionally intractable, and heuristics approaches such as 
the EM algorithm and the variational method are used 
in practice. Here, we use the variational annealing, an 
extension of the deterministic annealing for discrete 
variables [11]. In the next section, we give a small expla- 
nation of the variational annealing and show its effec- 
tiveness compared to the EM algorithm and the 
variational method. 

Parameter estimation by variational annealing 

In the deterministic annealing, optimization problem is 
solved while gradually changing temperature in a some 
schedule, and maximum likelihood estimator is obtained 
like the EM algorithm [12-14]. Yoshida and West pro- 
posed to use the deterministic annealing to find the 
configuration of the binary variables that maximizes the 
likelihood of factor models with sparseness priors [11]. 
We derive a new variational annealing method by 
extending Yoshida and West's approach to find the con- 
figuration of the binary variables on marginal likelihood 
function, which can be applied for searching E that 
maximizes Equation (1). 

Let £, X, and 0 be p dimensional binary variables, 
unobserved variables, and parameters, respectively, and 
consider to search E maximizing the following marginal 
likelihood: 



max log / dX I d®P(X,®,E). 

Ee{0,l} p J 



(2) 



The maximum of the marginal likelihood on E is 
bounded by the following formula: 

E max log JdxJ d®P{X, G, £) > r log J^^dE^J dxj d&P{X, 0, £) 

] - 0) 
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H 



v 




where r is called temperature and the equality holds 
for r — > +0. Hereafter, the integral range of E is 
omitted if no confusion occurs. Let Q(E) be a normal- 
ized non-negative function, i.e., Q(E) > 0 and J Q(E)dE 
= 1. 

From the Gibbs inequality, the right side of Equation 
(3) is also bounded: 



Here, Q(E) is considered as an approximation function 

of f p^!ir dE r Under the assumption that P(X,®,E) « Q 

(X)Q(0), where Q(X) and Q(0) are normalized non- 
negative functions, we have the following inequality 
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/, EQ(£)lo8 MZ™ ././^/^e^^^. ( 4 ) 

Thus, as an approximation of E maximizing Equation 
(2), we try to find E = argmax £ Q(E), where Q(E) is the 
function maximizing the lower bound in Equation (4). 
Since higher values on P(E) are weighed more in the 
approximation by Q(E) for r < 1, the better approxima- 
tion is expected for the higher values. This property on 
the limited case is shown in Proposition 1. As is in the 
variational method and the EM algorithm, the maxi- 
mum of the lower bound in Equation (4) is searched by 
a hill climbing from high temperature r > 1. 

In the hill climbing, Q(X), Q(0), and Q(E) are alter- 
nately updated from the following equations: 

Q(E) cx exp Q j dX j d0Q(X)Q(0) logP(E|X, ®)j , 

Q(X)ocexp(y dE j d0Q(E)Q(0)logP(X|0,E)^, 



Q(0) cx exp 



(/-/ 



dXQ(E)Q(X) logP(0|X, 



&\X,E)j 



Gradually converging r to 0, local optimum of the 
lower bound and corresponding Q(E), Q(X), and Q(0) 
are obtained. 

Effectiveness of variational annealing 

As alternatives of the variational annealing, we may con- 
sider the variational method and the EM algorithm 
where X is the set of hidden variables and E is handled as 
the set of parameters to be maximized. We show the 
effectiveness of variational annealing compared to the 
variational method and the EM algorithm under the fol- 
lowing conditions: P(X, 0, E) is factorized into P(X, E)P 
(0, E), and P(Ei\X, 0, E\{Ej}) is given as a binomial distri- 
bution, where E t is the /th element of E. If factorization 
of P(X, 0, E) = Q(X)Q(0)Q(£) is assumed in the calcula- 
tion of the variational method, arg max £ Q(E) is not the 
optimal solution of Equation (2) in general. In the EM 
algorithm, by allowing E to move around a j?-dimensional 
continuous space (0, 1) p ,E E m = argmax E€ ( 0/ i) m ^(^) can 
be calculated. Let £ EM j - be the ith element of £ EM . 
Usually, E EM/ i is mapped to 1 if E EM [ > 0.5 and 0 other- 
wise for discretizing £ EM to the ^-dimensional binary 
space {0,1}^, but such a mapping is not guaranteed to 
provide argmax £G { 0 ,i} m P(E). Although other mappings 
can be considered, to the best of our knowledge, no map- 
ping is guaranteed to provide the optimal solution in 
polynomial time of p. 

In the following, we prove a proposition in order to 
show that the variational annealing possibly give the 
optimal solution of Equation (2) even if the factorization 
of Q(E) = Ylu=i Q{Ei) * s additionally considered. 



Proposition 1. P(X,0,E) is factorized into P(0,E)P(X, 
E), and P(£;|X,0,£\{£J) is given as a binomial distribu- 
tion. Let Q(E[)be Q{E t ) maximizing the lower bound of 
the variational annealing for r — > +0 given by 



j dE x ...j dE p j dX j d®Q{X)Q{&)Y\Q{E l )log 



P{X, ®, E) 



Q(x)Q(®)YliQm 



(5) 



Then, the set of E t e {0,1} maximizing E EM /s 
argmax EG{01} pP(E). 

For the proof of the proposition, see Section 1 in 
Additional file 1. From Proposition 1, if the factorization 
P(X, 0, E) = P(0, E)P(X t E) is satisfied and optimal Q 
functions are found, the variational annealing is guaran- 
teed to provide the optimal solution of Equation (2) 
while the variational method and the EM algorithm are 
not. Although the factorization is not a generally satis- 
fied property, the factorization is often assumed in 
approaches based on the variational method, and the 
assumption usually works as good approximations. 
Thus, the variational annealing is expected to provide 
the better performance than the variational method and 
the EM algorithm even if the factorization is not satis- 
fied exactly. 

Procedures of variational annealing on proposed model 

In the variational annealing on the proposed model, we 
calculate Q functions for hidden variables X, parameters 
0, and binary variables E iteratively while cooling tem- 
perature r to zero gradually at each iteration cycle. In 
the following, we show the calculation procedures of Q 
(X), Q(0), and Q(E) on the proposed model as varia- 
tional E-step, variational M-step, and variational A-step, 
respectively. More details of the procedures are given in 
Additional file 1. For the notational brevity, we denote 
the expectation of a value x with a probability distribu- 
tion Q{y) as (x) Q(y y 
Variational E-step 

Parameters of Q(X) are mean of x t , variance of x t , and 
cross time variance of x t _ x and x t . These parameters can 
be calculated via variational Kalman filter by using fol- 
lowing terms expected with Q(0)Q(E): (E (c) )q( 0 )q( £) , (A) q 
(&)Q(E) , (H 1 A o E (C) ) Q(0)Q(£) , and ((A o E^H^A o £% 
(©)Q(£). For the details of variational Kalman filter, see [4]. 
From the parameters of Q(X), expectations of the follow- 
ing terms with Q(X) required in other steps are calcu- 

lated: (* t W ) QpQ , (* t W )'>Q(x> and (4 c) )'} Q{ x> 
Variational M-step 

Q(0) is factorized into n,Q(A,|/z,) Q(/z,)Q(r / ) II/Qfo/), 
where A t is a vector given by (A a , A ip )\ From the 
design of the proposed model, Q{A t \hi) f Q(/?/), Q(r,), and 
Q(Zij) are given in the following form: 
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Q{A i \h i )=M(A i ; ( i Ai ,h i T^), 
Q{h i )=ig{h i ;u i ,k i ), 
Q{n)= Win; Vi,li), 
Q(z ij ) = B^ ij0 ,^, l ). 

Parameters for the above / functions are calculated by 
using (4 c) )q(x), (4 C) (4 C) ) >Q(X)/ (4+1 (4 C) ) >Q(X)* and 

Variational A-step 

For the calculation of Q(£), we assume the factorization 
of Q(£) to ]~[ c Q(E^) in order to make the computa- 
tion tractable. Q(E^) follows a binomial distribution 
that takes one with probability ef^ and zero with prob- 
ability 1 — e\- \ and thus the expectation of with Q 
(E) is given by ej^l For the preparation, we calculate 
(A) Q(0)/ (H- 1 A) Q(0)/ (A'H- 1 A) Q(0) , (4 C) ) Q (X), (4 c) (^ ) )'>Q(x) 
and (^+1 )q(x> Q(^) * s tnen iteratively calculated 

by using these expected terms as well as (E^)q(e) ^ or ^ c 
* /. A few iterations are enough for the convergence. 
Update and selection of hyperparameters 
The proposed model contains u 0 , k 0 , v 0 , /o> do> d i> (%o> 
and a x as hyperparameters. a 0 and a x should be a 0 <a x 
as in the model setting, but this condition can be vio- 
lated in the update step of the variational method. Thus, 
we select a 0 and a x by cross validation, and update 
other hyperparameters as in the variational method. 

We first consider update of hyperparameters u 0 , k 0 , v 0 , 
lo> do> and Ca to increase the lower bound of marginal 
probability. u 0 and k 0 are updated by maximizing the 
following equation: 



•gmaxV ( Q(hi)logZ0(hi;u o ,feo)<ffii 

(M 0 ,fe 0 ) ; J 



{u 0 ,k 0 ) = aij 



= arg max(u 0 - 1) — + u 0 logfe 0 - fe 0 — -logr(u 0 ). 

(uo,feo) P P 

u 0 and fc 0 are obtained by numerical optimization 
methods such as the Newton-Raphson method. v 0 and 
l 0 are also updated in a similar manner to u 0 and k 0 . Cio 
and Cn are updated by solving the following equation: 



(&0/&1) = arg max V / Q(z§] 

fiO-fil ^. J 



= argmax(£i - 1) 



y)logtf(Zy;? i0 ,£l)^i; 

5,(log(l-q,)) 
+ Ifio - 1J 



f i0 and £ fl can also be obtained by the Newton-Raph- 
son method. 

For the selection of a 0 and (Xi, we set OL\ to some 
large value and select a 0 by a leave one out cross valida- 
tion procedure. For condition c e {1, 2} and time point 
t G 7^), we remove from data set y and use the data 



set to train the model. We calculate square sum of resi- 
dues rf )C between and the prediction of estimated 
from the variational Kalman filter on the trained model 



given by rf c = {yf - x\ )'{Yt — x t } B Y S ri< ^ search on 
parameter space of a 0 , we select a 0 that minimizes 

lt ' obs 

Summary of procedures 

The procedures for estimating parameters in the pro- 
posed model are summarized as follows: 

1. Set r to some large value. Also set a 0 to a small 
value and a x to a large value satisfying that a 0 <a v 

2. Initialize other hyperparameters and hidden 
variables. 

3. Perform the following procedures: 

(a) Calculate variational M-step. 

(b) Update hyperparameters. 

(c) Calculate variational E-step. 

(d) Calculate variational A-step. 

(e) Go back to step (a) until some convergence 
criterion is satisfied. 

4. Divide r by some value > 1 such as 1.05. 

5. Go back to step 3 if r is larger than some very 
small value > 0. 

In our setting, a x is set to 1,000. For the initialization 
of r and other hyperparameters, we use the following 

Settings: r = 2.5, Ef = 0.5, u { = 1, h = 1, v { = 1, h = 1, & 0 = 10, and fti = 10. 

If y( c ) is observed, we initialize x ^ with Otherwise, 
we use the linearly interpolated one. 

Results and discussion 

Performance evaluation by Monte Carlo experiments 

For the evaluation of the proposed approach, we gener- 
ate two linear regulatory network models with similar 
topological structures Gi and G 2 based on a linear regu- 
latory network model G 0 . G 0 is prepared in the follow- 
ing manner: (i) a scale free network of 100 nodes and 
150 edges is generated; (ii) edge directions are assigned 
randomly; (iii) autoloop edges are added to root nodes 
of the directed network; and (iv) AR coefficients for the 
directed edges are chosen randomly from {-0.9, -0.8, 
-0.7, -0.6, -0.5, 0.5, 0.6, 0.7, 0.8, 0.9}. We then generate 
Gi and G 2 from G 0 as follows: (i) autoloop edges and 
70% of non-autoloop edges in G 0 are used for com- 
monly existing edges in Gi and G 2 ; and (ii) the other 
30% of non-autoloop edges are randomly assigned as 
either Gi or G 2 specific edges. Note that AR coefficients 
on edges of Gi and G 2 are preserved, i.e., if a regulation 
from gene j to gene i exists in Gi or G 2 , then its coeffi- 
cient is the same as that of the regulation from gene j 
to gene i in G 0 . 

Figure 2 gives graph structure of Gi and G 2 , where 
commonly regulations, Gi specific regulations, and G 2 



Kojima et al. BMC Genomics 2012, 13(Suppl 1):S6 
http://www.biomedcentral.com/1471-2164/13/S1/S6 



Page 8 of 14 




9* 



Figure 2 The graph structures of G n and G 2 . Commonly regulations, G ] specific regulations, and G 2 specific regulations are represented with 
black, red, and green arrows, respectively. 



specific regulations are represented with black, red, and 
green arrows, respectively. 

From each of Gi and G 2 , we obtain two equally 
spaced time series data of 25 time points and 50 time 
points. For system noise and observation noise, normally 
distributed values with mean 0 and standard deviation 1 
and mean 0 and standard deviation 0.1 and 1 are used, 
respectively. The signal-noise ratio for system noise with 
standard deviation 1 and observation noise with stan- 
dard deviation 0.1 is 0.03dB, and the signal is bit stron- 
ger than the noise. On the other hand, for observation 
noise with standard deviation 1, the signal-noise ratio is 
-0.26dB. In the condition, the noise is stronger than the 
signal, and the noise level is quite high. 
Comparison between variational annealing and EM 
algorithm 

We first compare the performances of the proposed 
approach and the approach that is based on the pro- 
posed approach but uses the EM algorithm instead of 
the variation annealing using the equally spaced time 



series data of 50 and 25 time points on the system noise 
with standard deviation 1 and observation noise with 
standard deviation 0.1. From the comparison, we verify 
the effectiveness of the variational annealing, compared 
to the EM algorithm. Table 1 summarizes the results of 
the proposed approach and the EM algorithm based 
approach. For the EM algorithm, the regulation from j 
to i on condition c is considered to exist if the estimated 
E$ is more than 0.5. 

From the comparison, the results of the proposed 
approach contain more true positives than those of the 
EM algorithm based approach except for identifying 
changes on regulations for time points 50. For identify- 
ing changes on regulations, the EM algorithm based 
approach estimates bit more true positives than the pro- 
posed approach, but the difference is so small that it 
can be ignored. On the other hand, the results of the 
EM algorithm based approach contain more false posi- 
tives than those of the proposed approach, and hence 
the precision of the results by the EM algorithm is 
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worse than that of the proposed approach. Therefore, 
the effectiveness of the variational annealing is con- 
firmed in the computational experiment as well 
Comparison between proposed approach and existing 
approaches 

We employ the elastic net based VAR model approach 
[2] and the dynamic Bayesian network based approach 
termed G1DBN [8] as existing approaches for estimating 
networks from time series data. For the experiments, 
two versions of these approaches are considered: ENetl 
and ENet2 from the elastic net based VAR model 
approach, and G1DBN1 and G1DBN2 from G1DBN. 
These approaches are different in the following point: 
ENetl and G1DBN1 estimate Gi and G 2 independently, 
i.e., for the estimation of G 1} only time series data from 
Gi is used, while ENet2 and G1DBN2 assume that Gi 
and G 2 have the same network structure and estimate a 
network by using two time series data. Thus, ENet2 and 
G1DBN2 are considered to more use data sample than 
ENetl and G1DBN1 for network estimation, but 
changes on regulations between Gi and G 2 are not con- 
sidered. For selection of hyperparameters in ENetl and 
ENet2, AICc is used [2], and for hyperparameters a x 
and a 2 of G1DBN1 and G1DBN2, a setting of a x = 0.1 
and a 2 - 0.0059 considered in [8] is used. 

For the comparison of these approaches, we focus on 
the following two points: the number of correctly 



estimated regulations and the number of correctly esti- 
mated changes on regulations. The former is usually 
considered for evaluating the performance of gene net- 
work estimation methods. The numbers of true positives 
and false negatives of the estimated regulations are sum- 
marized in Table 2(a). The precisions of the results 
given by 

The number of true positives 
The number of true positives + The number of false positives 

are also provided. The results are averaged on ten data 
sets. The number of regulations in the true network 
models of G x and G 2 are in total 305. For the latter 
point, we consider the estimated regulations existing 
only in one of two estimated networks as changed regu- 
lations, and check if they correctly exist only in the cor- 
responding true network. The numbers of true positives 
and false negatives of the estimated changes on regula- 
tions and the precisions are summarized in Table 2(b). 
The results are also averaged on ten data sets. The 
number of true changes on regulations between in Gi 
and G 2 are 47, i.e., the number of true positives on this 
case is at most 47. 

For the estimation of the regulations in Table 2(a), the 
proposed approach outperforms other approaches in 
terms of true positives. The proposed approach contains 
more false positives than ENetl, G1DBN1, and 



Table 2 A summary of results for system noise with standard deviation 1 and observation noise with standard 
deviation 0.1 



(a) 









Equally spaced 
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# of time points 




50 






25 






50 






25 






# TP 


# FP 


PRE 


# TP 


# FP 


PRE 


#TP 


# FP 


PRE 


# TP 


# FP 


PRE 


Proposed 


295.9 


41.7 


0.88 


238.4 


71.6 


0.77 


262.4 


42.1 


0.86 


110.7 


37.2 


0.75 


ENetl 


246.3 


119.7 


0.67 


109.2 


67 


0.62 


84.7 


140.6 


0.38 


20.3 


70.4 


0.22 


ENet2 


277.9 


130.9 


0.68 


212.8 


130 


0.62 


169.7 


241.5 


0.41 


65.5 


132.5 


0.33 


G1DBN1 


223.7 


48 


0.82 


99.9 


46.2 


0.68 


65.1 


83.1 


0.44 


19.3 


72.7 


0.21 


G1DBN2 


268.8 


83.4 


0.76 


188.1 


64.5 


0.74 


134.8 


104.4 


0.56 


46.7 


85.7 


0.35 
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Unequally spaced 
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50 
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# TP 


# FP 
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# TP 


# FP 
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#TP 


# FP 
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# TP 


# FP 
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Proposed 


39.8 


13.2 


0.75 


23.4 


20.8 


0.53 


31.2 


16.5 


0.65 


5.5 


15.9 


0.26 


ENetl 


38.5 


153.1 


0.2 


18.6 


113 


0.14 


12.3 


186.6 


0.06 


3.7 


85 


0.04 


ENet2 


























G1DBN1 


35.6 


88.9 


0.29 


16.8 


91.9 


0.15 


10.3 


121.9 


0.08 


2.4 


87.8 


0.03 


G1DBN2 



























(a) The number of true positives (# TP) and false positives (# FP) of estimated regulations in two network model by the proposed approach, ENetl, ENet2, 
G1DBN1, and G1DBN2 for equally and unequally spaced time series data. PRE denotes the precision of the results. Regulations in two networks are 305 in total. 

(b) The number of true positives (# TP) and false positives (# FP) of changes on regulations between two network models estimated by the proposed approach, 
ENetl, ENet2, G1DBN1, and G1DBN2 for equally and unequally spaced time series data. Since no changes are estimated by ENet2 and G1DBN2, their results are 
indicated by The regulations changed in two networks are in total 47. 
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G1DBN2 on the data of 25 time points. We further con- 
sider these cases in terms of the precision. The preci- 
sions of the proposed approach, ENetl, G1DBN1, and 
G1DBN2 are given by 0.77, 0.62, 0.68, and 0.74, respec- 
tively. From this analysis, the proposed approach shows 
better performance than ENet2, G1DBN1, and G1DBN2 
on the whole. More true positives are estimated by 
ENet2 than ENetl, and the precisions of ENet2 tend to 
be better than those of ENetl. This type of relationship 
is also observed between G1DBN1 and G1DBN2. Also, 
the elastic net based VAR model approaches estimate 
more true positives than the approaches from G1DBN. 
However, in the precision, the approaches from G1DBN 
are better than the elastic net based VAR model 
approaches. From the results in Table 2(b), we see that 
the proposed approach estimates more true changes on 
regulations than ENetl and G1DBN1 on both data of 25 
and 50 time points. In addition, the results of the pro- 
posed approach contain less false positives than those of 
ENetl and G1DBN1. No changes on regulations are 
detected in ENet2 and G1DBN2 as topological differ- 
ences are ignored in these approaches. Since the pro- 
posed approach considers differences on network 
structures as well as uses two time series data efficiently, 
it can provide better results than other approaches on 
both estimating regulations and identifying changes on 
regulations. 

One may think it is strange that false positives in 
ENetl and G1DBN1 in Table 2(b) is more than those in 
Table 2(a), but this case can occur from the following 
reason. If a regulation exists in both of the true network 
models of Gi and G 2 , but is estimated only for G 1} then 
the case is not counted as a false positive in Table 2(a) 
while it is counted as a false positive in Table 2(b). 
Thus, the number of false positives in Table 2(b) can be 
greater than those in Table 2(a). 

In order to show the performance in unequally spaced 
time series data, we generate unequally spaced time ser- 
ies data of 25 and 50 observed time points. For time ser- 
ies data of 25 observed time points, we first generate 
equally spaced time series data of 40 time points and 
divide it into three blocks: 15 time points, 10 time 



points, and 15 time points. We then remove time points 
in the following manner: no time point is removed in 
the first block; one of every two time points are 
removed in the second block; and two of every three 
time points are removed in the third block. Figure 3 
shows the time point schedule of time series data 
obtained in this process. For time series data of 50 
observed time points, we first generate equally space 
time series data of 80 time points, divide it into three 
blocks: 30 time points, 20 time points, and 30 time 
points. Then, some time points are removed in a similar 
manner. We apply the proposed approach, ENetl, 
ENet2, G1DBN1, and G1DBN2 to the unequally spaced 
time series data. Results for the dataset are also sum- 
marized in Tables 2(a) and 2(b). From the comparison 
of results on equally and spaced time series data, the 
results of the proposed approach from unequally spaced 
time series data are worse than those from equally 
spaced one even with the same number of observed 
time points. However, results of ENetl, ENet2, 
G1DBN1, and G1DBN2 are worsened more than those 
of the proposed approach. This is probably because 
unequally spaced time points break their assumption, 
and their estimation process is misled. 

We also consider the time series data with the high 
level noise: system noise with standard deviation 1 and 
observation noise with standard deviation 1. The results 
for the case are summarized in Tables 3(a) and 3(b). 
The proposed approach shows the better performance 
on the number of true positives and precisions than 
other approaches except for the identification of the 
changes on regulations from the equally spaced time 
series data of 50 time points. For equally spaced time 
series data of 50 time points, the number of true posi- 
tives on the changes on regulations estimated by the 
proposed approach is more than that of G1DBN1, but 
less than that of ENetl. However, the precisions on the 
estimated changes by both ENetl and G1DBN are much 
worse than the proposed approach. Thus, overall, the 
proposed approach is more effective than other meth- 
ods. Although the proposed approach provides the bet- 
ter performance than other methods, the results of all 



1 st block: 15 time points 2 nd block: 5 time points 3 rd block: 5 time points 





0 12 ... 13 14 
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Every 2nd point is observed 
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Figure 3 A time point schedule on unequally spaced time series data in the Monte Carlo experiment. Observed points in the time 

schedule are indicated by arrows. 15 time points are equally spaced in first block, every second point is observed in second block comprised of 

5 observed time points, and every third point is observed in third block comprised of 5 observed time points. 
I ) 
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Table 3 A summary of results for system noise with standard deviation 1 and observation noise with standard 
deviation 1 
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121.0 


0.42 


132.1 


675.5 


0.66 


52.1 


108.9 


0.33 
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75.9 


0.29 
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0.09 
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0.51 


22.6 


90.8 


0.2 


26.3 
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0.26 


4.7 


50.0 


0.09 


8.1 


16.7 


0.33 


3.1 


42.5 


0.07 
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ENet2 
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0.02 
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(a) The number of true positives (# TP) and false positives (# FP) of estimated regulations in two network model by the proposed approach, ENetl, ENet2, 
G1DBN1, and G1DBN2 for equally and unequally spaced time series data. PRE denotes the precision of the results. Regulations in two networks are 305 in total. 

(b) The number of true positives (# TP) and false positives (# FP) of changes on regulations between two network models estimated by the proposed approach, 
ENetl, ENet2, G1DBN1, and G1DBN2 for equally and unequally spaced time series data. Since no changes are estimated by ENet2 and G1DBN2, their results are 
indicated by The regulations changed in two networks are in total 47. 



the approaches are worsened due to the high level noise, 
and the differences on the performance among the 
approaches get smaller, compared to the case of obser- 
vation noise with standard deviation 0.1. 

Analysis of time series microarray data from Human small 
airway epithelial cells 

We apply the proposed approach to two time series 
microarray gene expression data from normal Human 
small airway epithelial cells (SAECs) and SAECs treated 
by stimulating EGF-receptors and dosing an anticancer 
drug termed Gefitinib. EGF-receptors are often overex- 
pressed in lung cancer cells such as tumoral SAECs, and 
a lung cancer condition is simulated in the treated 
SAECs by stimulating EGF-receptors. Since Gefitinib is 
known as a selective inhibiter of EGF-receptors, the sti- 
mulation of EGF-receptors would be counteracted by 
Gefitinib, and the condition of treated SACEs should be 
the same as that of normal SAECs in theory. However, 
since some gene expression patterns are different 
between the two conditions in practice, some unknown 
effects by Gefitinib may be involved in the phenomenon. 
Thus, we focus on changed regulations between the 
gene networks estimated from gene expression data in 
these two conditions in order to find some insights on 
the unknown effects of Gefitinib. 

For gene set selection, we first screen 500 genes from 
the ranking of the gene list sorted by coefficient 



variation [5]. We then select 100 genes with highly var- 
ied expression profiles between normal SAECs and trea- 
ted SAECs. The time series gene expression data from 
both types of cells are comprised of spaced 14 time 
points in 48 hours. The time schedule of 14 time points 
are {0 h, 6 h, 9 h, 12 h, 15 h, 18 h, 21 h, 24 h, 27 h, 30 
h, 33 h, 36 h, 39 h, 48 h}. For the analysis on the pro- 
posed approach, we set interval on system model to 
three hours. 



Table 4 Changes on regulations between normal and 


treated SAECs 




Normal SAECs 


Treated SAECs 


ZC3HAV1L-> FOXA2 


Prss22 foxn2 


LIF -> foxn2 


Prss22 -> cdk14 


Cdc42ep2 Spink6 


Prss22 -» Camk2n1 


Siglecl 5 -> NTN1 


Prss22 -» cttn 


HAS3 -> HAS3 


Prss22 -> Sfrs6 


HAS3 -> End 


Prss22 -> ITGA2 


HAS3 -> LEPREL1 


Prss22 pkn2 




Prss22 -> Hs2st1 




Prss22 -> FILIP1L 




Prss22 -» Hcn2 




Prss22 -> KLF16 




Ktelcl -> NTN1 




Tm6sf1 -> Siglecl 5 



Estimated regulations only in normal or treated SAECs are listed in the left 
side or right side, respectively. 
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The estimated networks from time series gene expres- 
sion data in normal SAECs and treated SAECs are sum- 
marized and given in Figure 4. Black arrows, red arrows, 
and green arrows indicate regulations in both condi- 
tions, only in normal SAECs, and only in treated 
SAECs, respectively. Table 4 gives a list of the estimated 
regulations only in normal SAECs or in treated SAEC 

From Table 4, we see that Prss22 is involved in most 
of the regulations only in treated SAECs. Prss22 is a 



tryptase, one of serine proteases, and its relationship 
with the airways is suggested by a report about its 
expression in the airways in a developmentally regulated 
manner. Tryptase is a potent mitogen of fibroblast [15], 
and it is reported that the increase and activation of 
fibroblast are promoted in lung cells under the condi- 
tion of interstitial pneumonia. Interstitial pneumonia is 
known as a side effect of Gefitinib, and these findings 
suggest that Prss22 is an off-target of Gefitinib and is 
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Figure 4 An estimated gene network by the proposed approach from time series gene expression data on normal SAECs and treated 
SAECs. In the estimated network, regulations in both conditions are in black, and regulations only in normal SAECs and treated SAECs are in 
red and green, respectively. 



Kojima et al. BMC Genomics 2012, 13(Suppl 1):S6 
http://www.biomedcentral.eom/1 471 -21 64/1 3/S1 /S6 



Page 13 of 14 



possibly related to the side effect of Gefitinib. Also, 
FILIP1L, a gene estimated as one of targets of Prss22 in 
treated SAECs, is reported as an inhibitor of cell prolif- 
eration of fibroblast [16]. This observation supports the 
relation between Prss22 and fibroblast as well 

We also focus on other several genes related to 
changes on regulations in normal and treated SAECs. 
LIF, leukemia inhibitory factor, is known to affect cell 
growth and development. Gefitinib is also known to be 
effective for acute myelogenous leukemia via Sky, which 
is an off- target gene of Gefitinib [17]. In addition, Wang 
et al. reported that LIF prolongs the cell cycle of stem 
cells on acute myelogenous leukemia lines [18]. 
Although, to the best of our knowledge, no direct influ- 
ence from Gefitinib to LIF is reported, the above facts 
suggest some relation between Gefitinib and LIF, and 
support changes on regulations of LIF between normal 
and treated SAECs. 

Heikema et al. reported that Human Siglecs, Siglecs- 
14, -15, and -16 interact with transmembrain adaptor 
proteins containing the immunoreceptor tyrosine-based 
activation motif such as DAP12 [19], and therefore they 
potentially mediate the activation of intracellular signal- 
ing. Gefitinib is a selective inhibitor of tyrosine kinase, 
and the inhibition of tyrosine kinase is considered to 
affect the regulations around Siglec-15. 

Although the stimulation of EGF-receptors in the trea- 
ted SAECs is considered to be counteracted by Gefitinib, 
the expressions of some genes may be affected by the 
stimulation in practical conditions. HAS3 is related to 
synthesis of the unbranched glycosaminoglycan hyaluro- 
nic acid and is reported to be up-regulated by EGF [20]. 
The stimulation of EGF-receptors affects the amount of 
EGF taken into cells, and hence the stimulation is 
expected to cause the changes on the regulations around 
HAS3. Foxn2 is a member of family of Fox proteins. Fox 
proteins are known to play important roles on control- 
ling the expressions of genes related to cell growth, pro- 
liferation, and differentiation. Some members of Fox 
family are related to EFG-receptors, e.g., the expression 
of Foxnl is suppressed by EGF-receptor signaling [21] 
although no direct relation between EGF-receptors and 
Foxn2 is found. FOXA2 is also a member of family of 
Fox proteins. EGF-receptor signaling is known to 
decrease the expression of FOXA, which prevents the 
mucus production [22]. [23] reported the relation 
between EGF-receptors and FOXA2 in the airways of 
asthmatic patients. Thus, it appears that the change on 
the regulation related to FOXA2 is caused by the stimu- 
lation of EGF-receptors. 

Conclusions 

We proposed the new computational model that is 
based on VAR-SSM and estimates gene networks from 



time series data on normal and treated conditions as 
well as identifies changes regulations by the treatment. 
Unlike many of existing gene network estimation 
approaches assuming equally spaced time points, our 
approach can handle unequally spaced time series data. 
The efficient use of time series data is achieved by 
representing the presence of regulations on each condi- 
tion with hidden binary variables. Since finding the opti- 
mal configuration of the hidden binary variables on the 
proposed model is computationally in tractable, we 
derive the extended variational annealing method in 
order to address the problem as the alternative method. 

In the Monte Carlo experiments, we use equally and 
spaced time series data from synthetically generated two 
regulatory networks whose structures are different in 
several regulations, and verified the effectiveness of the 
proposed model in both estimation of regulations and 
changes on regulations between the two conditions, 
compared to existing methods. 

As the real data application, we use the proposed 
approach to analyze two time series data from normal 
SAECs and SAECs treated by stimulating EGF-receptors 
and dosing Gefitinib. From genes related to changes on 
regulations by the treatment, we find possible off-target 
genes of Gefitinib, and one of these genes is suggested 
to be related to a factor of interstitial pneumonia, which 
is known as a side effect of Gefitinib. In this study, we 
consider changes on regulations in two conditions, but 
the proposed approach can be extended to identifying 
changes among more than two conditions. 

Additional material 

c >i 

Additional file 1: Proof of Proposition 1 and more details on the 
procedures of variational annealing. A proof of Proposition 1 and 
more details on the procedures of variational annealing on the proposed 
model are described. 
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